Extraction of Layout Entities and Sub-layout Query-based Retrieval of Document Images
نویسندگان
چکیده
Layouts and sub-layouts constitute an important clue while searching a document on the basis of its structure, or when textual content is unknown/irrelevant. A sub-layout specifies the arrangement of document entities within a smaller portion of the document. We propose an efficient graph-based matching algorithm, integrated with hash-based indexing, to prune a possibly large search space. A user can specify a combination of sub-layouts of interest using sketch-based queries. The system supports partial matching for unspecified layout entities. We handle cases of segmentation pre-processing errors (for text/non-text blocks) with a symmetry maximization-based strategy, and accounting for multiple domain-specific plausible segmentation hypotheses. We show promising results of our system on a database of unstructured entities, containing 4776 newspaper images.
منابع مشابه
Layout Based Information Retrieval from Document Images
This research is intended to develop a layout based retrieval system for document image databases consisting of three phases: 1. At first, intelligent layout analysis algorithm has been designed to extract the layouts the document images physically with their edges and rectangles. 2. Every physically identified layout has been converted into a tree intermediary representation for indexing and s...
متن کاملModel-Guided Segmentation and Layout Labelling of Document Images Using a Hierarchical Conditional Random Field
We present a model-guided segmentation and document layout extraction scheme based on hierarchical Conditional Random Fields (CRFs, hereafter). Common methods to classify a pixel of a document image into classes text, background and image are often noisy, and error-prone, often requiring post-processing through heuristic methods. The input to the system is a pixel-wise classification based on t...
متن کاملDocument Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملDocument Image Retrieval Based on Layout Structural Similarity
In this paper, we describe issues related to the measurement of structural similarity between document images. We define structural similarity, and discuss the benefits of using it as a complement to content similarity for querying document image databases. We present an approach to computing a geometrically invariant structural similarity, and use this measure to search document image database...
متن کاملUnconstrained Tight Structure Extraction Using Voronoi Tesselation on Document Images
Document structure is the intermediary result obtained through page segmentation, which is used in the analysis of the document image. The structure serves the purpose of extracting the shape of the document from paragraph up to character level in a hierarchical exploratory methodology for understanding the layout structure of the document image. The extracted layout forms a dominant feature wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1609.02687 شماره
صفحات -
تاریخ انتشار 2016